wo 2005/031791 



- 23 - 



PCT/EP2004/010736 



WHAT IS CLAIMED IS: 

1. A method of compressing mass spectrometry data, 
comprising the steps of: 

(a) reading data corresponding to a spectrum; 
5 (b) carrying out a statistical analysis of noise 

within the read data to obtain at least one statistical 
moment or parameter related to the distribution of that 
noise; 

(c) determining a threshold value from the, or at 
10 least one of the, obtained statistical parameters; 

(d) identifying peaks in the spectrum by comparison of 
the data points in the spectrum to the said threshold value; 
and 

(e) storing information related to the identified 
15 peaks along with the obtained statistical parameters. 

2. The method of claim 1, wherein the step of storing 
the information related to the identified peak(s) comprises 
storing the data points of any peaks and discarding the 
20 noise data. 



3. The method of claim 1 or claim 2, further 
comprising generating a mass spectrum subsequent to the step 
(e) of storage. 
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4, 



The method of claim 3, further comprising 



displaying the mass spectrum, 



5 



5, 



The method of claim 4,^ wherein the step of 



displaying comprises displaying only the identified peaks 
without also displaying the noise in the read data. 

6, The method of any preceding claim, further 
10 comprising, after the step of storage, reconstructing the 
noise data based upon one or more of the stored statistica 
parameters - 



15 or claim 4, wherein the step of generating the mass spectrum 
comprises generating the mass spectrum comprises generating 
a mass spectrum which includes both peak data and noise 
data, by combining the stored peak data with the 
reconstructed noise data. 



8. The method of any preceding claim, wherein the 
statistical moment is selected from the list comprising an 
expectation value, a standard deviation, and a variance. 



7. 



The method of claim 6 when dependent upon claim 3 



20 
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9. The method of claim 8, wherein the threshold is 
EN+x.DN, where EN is the expectation value and DN is the 
standard deviation, and wherein x is a multiplication 
factor . 

5 

10. The method of claim 9, wherein x is about 2.5. 

11. The method of any preceding claim, wherein the 
mass spectral data is FTMS data, wherein the noise in the 

10 read data is Weibull-distributed, and wherein step (b) of 
statistically analysing comprises identifying at least on 
statistical moment of the read data which best fits that 
Weibull distribution. 



15 12. The method of any one of claims 1 to 10, wherein 

the mass spectrometric data is time of flight mass 
spectrometer (TOF MS) data, wherein the noise in the read 
data is Poisson-distributed, and wherein the step (b) of 
statistical analysis comprises identifying at least one 

20 statistical moment of the read data which best fits that 
Poisson distribution. 
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13. The method of any preceding claim, wherein the 
step (b) of carrying out a statistical analysis of the noise 
comprises : 

(f) obtaining a best fit of the read data to a 
5 predetermined distribution; 

(g) determining, from that best fit, one or more 
preliminary statistical moment (s); 

(h) generating a preliminary threshold based on the, 
or at least one of the, preliminary statistical moment (s); 

10 (j) removing from the read data, all data points above 

that preliminary threshold; and 

(k) re-calculating a best fit of that truncated read 
data to a predetermined distribution so as to obtain the 
said at least one statistical moment or parameter related to 

15 that noise in step (b) . 

14. The method of claim 13, further comprising: 
recursively repeating the step (j) of removing read 

data above a previously determined threshold, and 
20 recursively repeating the step (f) of obtaining a best fit, 

this time of the further truncated data to a predetermined 

distribution, so as to cause convergence of the or each 

statistical moment. 
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15. A method according to any one of the preceding 
claims, further comprising the step of determining the 
position of magnitude of the centre of any identified peaks, 
and wherein step (e) comprises storing any centre positions 

5 and magnitudes. 

16. A method according to any preceding claim, wherein 
step (d) comprises identifying any peaks by recognising 
strings of three or more consecutive data points greater 

10 than the threshold. 

17. A method according to any preceding claim, 
comprising the steps of determining the positions of two or 
more identified peaks, comparing the positions to determine 

15 whether they are part of any predetermined isotopic sequence 
and, if they are, storing data points at positions 
corresponding to other expected peaks within the isotopic 
sequence . 

20 18. A method according to any of claims 1 to 16, 

comprising the steps of determining the position of any 
unidentified peaks, comparing any peaks to determine any 
matches to predetermined parent/fragment molecular masses 
and, if any matches are found, storing data points 
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corresponding to other expected peaks within the 
parent /fragment group. 



19. A method of compressing mass spectrometric data, 
5 comprising the steps of: 

(1) reading data corresponding to a spectrum; 
(m) dividing the received data into at least two 
blocks; 

(n) carrying out a statistical analysis on a first of 
10 the at least two blocks, of noise within read data within 
that block, to obtain at least one statistical moment or 
parameter relating to the distribution of the noise in that 
block; 

(p) determining a threshold value from the, or at 
15 least one of the, statistical parameters obtained in respect 
of the noise within that block; 

(q) identifying peaks in that block of the spectrum, 
by comparison of the data points in that block of the 
spectrum to the said threshold value determined for that 

20 block; and 

(r) storing information related to the identified 
peaks in that block, along with the obtained statistical 
parameters for that block. 
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20. The method of claim 19, further comprising 
repeating steps (n) to (r) for at least one further block. 

21. The method of claim 20, further comprising 

5 identifying, from the plurality of blocks, a preferred block 
upon which the steps (n) to (q) , or (n) to (r) , are first to 
be carried out. 

22. The method of claim 21, wherein the step of 

10 identifying a preferred block is based upon the relative 
likelihood of data in a particular block having a small 
number of peaks in it. 

23. The method of any of claims 19 to 22, wherein the 
15 step (n) comprises obtaining a best fit of the read data for 

that block to a predetermined distribution; 

determining, from that best fit, one or more 
preliminary statistical moment (s) for that block; 

generating a preliminary threshold, based on the, or at 
20 least one of the, preliminary statistical moment (s) for that 
block; 

removing, from the read data for that block, all data 
points above that preliminary threshold; and 
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re-calculating a best fit of that truncated read data 
to a predetermined distribution, for that block, so as to 
obtain the said at least one statistical moment or parameter 
related to that noise in step (n) for that block. 

5 

24. The method of claim 23, further comprising 
recursively repeating the step of removing data above a 
previously determined threshold for a particular block, and 
best fitting the further truncated data to a predetermined 
10 distribution, so as to cause convergence of the, or at least 
one of the, statistical moment (s) for that block. 

25. The method of claim 23 or claim 24, further 
comprising repeating steps (n) to (r) of claim 19 for a next 

15 block, and wherein the step (n) further comprises, for that 
next block, removing, from the read data for that next 
block, all data points above the threshold determined for 
the previous block; and 

re-calculating a best fit of the truncated read data in 

20 that next block to a predetermined distribution, so as to 

obtain a further statistical moment or moments for that next 
block. 
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26. A mass spectrum when generated from data when 
compressed in accordance with the method of any one of the 
preceding claims . 

5 27. Compressed data produced in accordance with the 

method of any one of the preceding claims. 

28. A computer-readable medium having recorded thereon 
compressed mass spectrometric data generated in accordance 

10 with the method of any of claims 1 to 25. 

29. A method of compressing mass spectrometry data 
substantially as described herein with reference to any of 
the accompanying Figures 



